RedwineQuality by Pasit Nusso

output: html_document: fig_width: 6 fig_height: 4

Analysis

Univariate Plots Section

## 'data.frame':    1599 obs. of  12 variables:
##  $ fixed.acidity       : num  7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
##  $ volatile.acidity    : num  0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
##  $ citric.acid         : num  0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
##  $ residual.sugar      : num  1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
##  $ chlorides           : num  0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
##  $ free.sulfur.dioxide : num  11 25 15 17 11 13 15 15 9 17 ...
##  $ total.sulfur.dioxide: num  34 67 54 60 34 40 59 21 18 102 ...
##  $ density             : num  0.998 0.997 0.997 0.998 0.998 ...
##  $ pH                  : num  3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
##  $ sulphates           : num  0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
##  $ alcohol             : num  9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
##  $ quality             : int  5 5 5 6 5 5 5 7 7 5 ...
##  fixed.acidity   volatile.acidity  citric.acid    residual.sugar  
##  Min.   : 4.60   Min.   :0.1200   Min.   :0.000   Min.   : 0.900  
##  1st Qu.: 7.10   1st Qu.:0.3900   1st Qu.:0.090   1st Qu.: 1.900  
##  Median : 7.90   Median :0.5200   Median :0.260   Median : 2.200  
##  Mean   : 8.32   Mean   :0.5278   Mean   :0.271   Mean   : 2.539  
##  3rd Qu.: 9.20   3rd Qu.:0.6400   3rd Qu.:0.420   3rd Qu.: 2.600  
##  Max.   :15.90   Max.   :1.5800   Max.   :1.000   Max.   :15.500  
##    chlorides       free.sulfur.dioxide total.sulfur.dioxide
##  Min.   :0.01200   Min.   : 1.00       Min.   :  6.00      
##  1st Qu.:0.07000   1st Qu.: 7.00       1st Qu.: 22.00      
##  Median :0.07900   Median :14.00       Median : 38.00      
##  Mean   :0.08747   Mean   :15.87       Mean   : 46.47      
##  3rd Qu.:0.09000   3rd Qu.:21.00       3rd Qu.: 62.00      
##  Max.   :0.61100   Max.   :72.00       Max.   :289.00      
##     density             pH          sulphates         alcohol     
##  Min.   :0.9901   Min.   :2.740   Min.   :0.3300   Min.   : 8.40  
##  1st Qu.:0.9956   1st Qu.:3.210   1st Qu.:0.5500   1st Qu.: 9.50  
##  Median :0.9968   Median :3.310   Median :0.6200   Median :10.20  
##  Mean   :0.9967   Mean   :3.311   Mean   :0.6581   Mean   :10.42  
##  3rd Qu.:0.9978   3rd Qu.:3.400   3rd Qu.:0.7300   3rd Qu.:11.10  
##  Max.   :1.0037   Max.   :4.010   Max.   :2.0000   Max.   :14.90  
##     quality     
##  Min.   :3.000  
##  1st Qu.:5.000  
##  Median :6.000  
##  Mean   :5.636  
##  3rd Qu.:6.000  
##  Max.   :8.000

Most wine’s quality is 6 and range is 3 to 8. The mean of alcohol is 10.42.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    4.60    7.10    7.90    8.32    9.20   15.90

The volatile.acidity distribution is normal. The median is 7.9.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1200  0.3900  0.5200  0.5278  0.6400  1.5800

The volatile.acidity distribution is bimodal with the volatile.acidity peaking at 0.4, 0.5 and 0.6.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.090   0.260   0.271   0.420   1.000
## feature
##    0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09  0.1 0.11 0.12 0.13 0.14 
##  132   33   50   30   29   20   24   22   33   30   35   15   27   18   21 
## 0.15 0.16 0.17 0.18 0.19  0.2 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 
##   19    9   16   22   21   25   33   27   25   51   27   38   20   19   21 
##  0.3 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39  0.4 0.41 0.42 0.43 0.44 
##   30   30   32   25   24   13   20   19   14   28   29   16   29   15   23 
## 0.45 0.46 0.47 0.48 0.49  0.5 0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59 
##   22   19   18   23   68   20   13   17   14   13   12    8    9    9    8 
##  0.6 0.61 0.62 0.63 0.64 0.65 0.66 0.67 0.68 0.69  0.7 0.71 0.72 0.73 0.74 
##    9    2    1   10    9    7   14    2   11    4    2    1    1    3    4 
## 0.75 0.76 0.78 0.79    1 
##    1    3    1    1    1

The distribution for citric acid appears bimodal with the peaking at 0, 0.24, 0.49.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.900   1.900   2.200   2.539   2.600  15.500
## feature
##  0.9  1.2  1.3  1.4  1.5  1.6 1.65  1.7 1.75  1.8  1.9    2 2.05  2.1 2.15 
##    2    8    5   35   30   58    2   76    2  129  117  156    2  128    2 
##  2.2 2.25  2.3 2.35  2.4  2.5 2.55  2.6 2.65  2.7  2.8 2.85  2.9 2.95    3 
##  131    1  109    1   86   84    1   79    1   39   49    1   24    1   25 
##  3.1  3.2  3.3  3.4 3.45  3.5  3.6 3.65  3.7 3.75  3.8  3.9    4  4.1  4.2 
##    7   15   11   15    1    2    8    1    4    1    8    6   11    6    5 
## 4.25  4.3  4.4  4.5  4.6 4.65  4.7  4.8    5  5.1 5.15  5.2  5.4  5.5  5.6 
##    1    8    4    4    6    2    1    3    1    5    1    3    1    8    6 
##  5.7  5.8  5.9    6  6.1  6.2  6.3  6.4 6.55  6.6  6.7    7  7.2  7.3  7.5 
##    1    4    3    4    4    3    2    3    2    2    2    1    1    1    1 
##  7.8  7.9  8.1  8.3  8.6  8.8  8.9    9 10.7   11 12.9 13.4 13.8 13.9 15.4 
##    2    3    2    3    1    2    1    1    1    2    1    1    2    1    2 
## 15.5 
##    1

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.900   1.900   2.200   2.539   2.600  15.500
## feature
##  0.9  1.2  1.3  1.4  1.5  1.6 1.65  1.7 1.75  1.8  1.9    2 2.05  2.1 2.15 
##    2    8    5   35   30   58    2   76    2  129  117  156    2  128    2 
##  2.2 2.25  2.3 2.35  2.4  2.5 2.55  2.6 2.65  2.7  2.8 2.85  2.9 2.95    3 
##  131    1  109    1   86   84    1   79    1   39   49    1   24    1   25 
##  3.1  3.2  3.3  3.4 3.45  3.5  3.6 3.65  3.7 3.75  3.8  3.9    4  4.1  4.2 
##    7   15   11   15    1    2    8    1    4    1    8    6   11    6    5 
## 4.25  4.3  4.4  4.5  4.6 4.65  4.7  4.8    5  5.1 5.15  5.2  5.4  5.5  5.6 
##    1    8    4    4    6    2    1    3    1    5    1    3    1    8    6 
##  5.7  5.8  5.9    6  6.1  6.2  6.3  6.4 6.55  6.6  6.7    7  7.2  7.3  7.5 
##    1    4    3    4    4    3    2    3    2    2    2    1    1    1    1 
##  7.8  7.9  8.1  8.3  8.6  8.8  8.9    9 10.7   11 12.9 13.4 13.8 13.9 15.4 
##    2    3    2    3    1    2    1    1    1    2    1    1    2    1    2 
## 15.5 
##    1
## 90% 
## 3.6

Transform the long tail data to better understand the distribution of residual.sugar The distribution for residual.sugar appears to be right skewed. Most of them (90%) residual.sugar less than 3.6 (4.5 g / cm^3 are considered sweet).

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.01200 0.07000 0.07900 0.08747 0.09000 0.61100
##    95% 
## 0.1261

Transform the long tail data to better understand the distribution of chlorides The distribution for chlorides appears to be right skewed. Most of them (95%) chlorides less than 0.1261 .

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    7.00   14.00   15.87   21.00   72.00
## 95% 
##  35

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    7.00   14.00   15.87   21.00   72.00
## 95% 
##  35

Transform the long tail data to better understand the distribution of free.sulfur.dioxide The free.sulfur.dioxide distribution is bimodal with the free.sulfur.dioxide peaking at 7 and 17.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    6.00   22.00   38.00   46.47   62.00  289.00
##   95% 
## 112.1

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    6.00   22.00   38.00   46.47   62.00  289.00
##   95% 
## 112.1

Transform the long tail data to better understand the distribution of total.sulfur.dioxide The total.sulfur.dioxide distribution is normal.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.740   3.210   3.310   3.311   3.400   4.010
## feature
## 2.74 2.86 2.87 2.88 2.89  2.9 2.92 2.93 2.94 2.95 2.98 2.99    3 3.01 3.02 
##    1    1    1    2    4    1    4    3    4    1    5    2    6    5    8 
## 3.03 3.04 3.05 3.06 3.07 3.08 3.09  3.1 3.11 3.12 3.13 3.14 3.15 3.16 3.17 
##    6   10    8   10   11   11   11   19    9   20   13   21   34   36   27 
## 3.18 3.19  3.2 3.21 3.22 3.23 3.24 3.25 3.26 3.27 3.28 3.29  3.3 3.31 3.32 
##   30   25   39   36   39   32   29   26   53   35   42   46   57   39   45 
## 3.33 3.34 3.35 3.36 3.37 3.38 3.39  3.4 3.41 3.42 3.43 3.44 3.45 3.46 3.47 
##   37   43   39   56   37   48   48   37   34   33   17   29   20   22   21 
## 3.48 3.49  3.5 3.51 3.52 3.53 3.54 3.55 3.56 3.57 3.58 3.59  3.6 3.61 3.62 
##   19   10   14   15   18   17   16    8   11   10   10    8    7    8    4 
## 3.63 3.66 3.67 3.68 3.69  3.7 3.71 3.72 3.74 3.75 3.78 3.85  3.9 4.01 
##    3    4    3    5    4    1    4    3    1    1    2    1    2    2

The pH distribution is normal.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.9901  0.9956  0.9968  0.9967  0.9978  1.0040

The distribution for density acid appears to be normal and the different between min and max is only 0.014. ( different between alcohol and water is 0.22)

Ref : https://en.wikipedia.org/wiki/Ethanol

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.3300  0.5500  0.6200  0.6581  0.7300  2.0000
##  95% 
## 0.93

Transform the long tail data to better understand the distribution of sulphates. The distribution for sulphates appears to be normal. Most of them (95%) sulphates less than 0.93.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    8.40    9.50   10.20   10.42   11.10   14.90

The distribution for alcohol appears to be right skewed.

## feature
##   3   4   5   6   7   8 
##  10  53 681 638 199  18
## [1] 0.9493433

Most of data’s wine qulity is between 5 to 7 (94.9 %). I think I will covert this feature to factor for Multivariate Analysis.

Univariate Analysis

What is the structure of your dataset?

ANS : There are 1599 wine in the data set with 12 features.

## 'data.frame':    1599 obs. of  12 variables:
##  $ fixed.acidity       : num  7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
##  $ volatile.acidity    : num  0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
##  $ citric.acid         : num  0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
##  $ residual.sugar      : num  1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
##  $ chlorides           : num  0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
##  $ free.sulfur.dioxide : num  11 25 15 17 11 13 15 15 9 17 ...
##  $ total.sulfur.dioxide: num  34 67 54 60 34 40 59 21 18 102 ...
##  $ density             : num  0.998 0.997 0.997 0.998 0.998 ...
##  $ pH                  : num  3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
##  $ sulphates           : num  0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
##  $ alcohol             : num  9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
##  $ quality             : int  5 5 5 6 5 5 5 7 7 5 ...
##  fixed.acidity   volatile.acidity  citric.acid    residual.sugar  
##  Min.   : 4.60   Min.   :0.1200   Min.   :0.000   Min.   : 0.900  
##  1st Qu.: 7.10   1st Qu.:0.3900   1st Qu.:0.090   1st Qu.: 1.900  
##  Median : 7.90   Median :0.5200   Median :0.260   Median : 2.200  
##  Mean   : 8.32   Mean   :0.5278   Mean   :0.271   Mean   : 2.539  
##  3rd Qu.: 9.20   3rd Qu.:0.6400   3rd Qu.:0.420   3rd Qu.: 2.600  
##  Max.   :15.90   Max.   :1.5800   Max.   :1.000   Max.   :15.500  
##    chlorides       free.sulfur.dioxide total.sulfur.dioxide
##  Min.   :0.01200   Min.   : 1.00       Min.   :  6.00      
##  1st Qu.:0.07000   1st Qu.: 7.00       1st Qu.: 22.00      
##  Median :0.07900   Median :14.00       Median : 38.00      
##  Mean   :0.08747   Mean   :15.87       Mean   : 46.47      
##  3rd Qu.:0.09000   3rd Qu.:21.00       3rd Qu.: 62.00      
##  Max.   :0.61100   Max.   :72.00       Max.   :289.00      
##     density             pH          sulphates         alcohol     
##  Min.   :0.9901   Min.   :2.740   Min.   :0.3300   Min.   : 8.40  
##  1st Qu.:0.9956   1st Qu.:3.210   1st Qu.:0.5500   1st Qu.: 9.50  
##  Median :0.9968   Median :3.310   Median :0.6200   Median :10.20  
##  Mean   :0.9967   Mean   :3.311   Mean   :0.6581   Mean   :10.42  
##  3rd Qu.:0.9978   3rd Qu.:3.400   3rd Qu.:0.7300   3rd Qu.:11.10  
##  Max.   :1.0037   Max.   :4.010   Max.   :2.0000   Max.   :14.90  
##     quality     
##  Min.   :3.000  
##  1st Qu.:5.000  
##  Median :6.000  
##  Mean   :5.636  
##  3rd Qu.:6.000  
##  Max.   :8.000
##   fixed.acidity volatile.acidity citric.acid residual.sugar chlorides
## 1           7.4             0.70        0.00            1.9     0.076
## 2           7.8             0.88        0.00            2.6     0.098
## 3           7.8             0.76        0.04            2.3     0.092
## 4          11.2             0.28        0.56            1.9     0.075
## 5           7.4             0.70        0.00            1.9     0.076
## 6           7.4             0.66        0.00            1.8     0.075
##   free.sulfur.dioxide total.sulfur.dioxide density   pH sulphates alcohol
## 1                  11                   34  0.9978 3.51      0.56     9.4
## 2                  25                   67  0.9968 3.20      0.68     9.8
## 3                  15                   54  0.9970 3.26      0.65     9.8
## 4                  17                   60  0.9980 3.16      0.58     9.8
## 5                  11                   34  0.9978 3.51      0.56     9.4
## 6                  13                   40  0.9978 3.51      0.56     9.4
##   quality
## 1       5
## 2       5
## 3       5
## 4       6
## 5       5
## 6       5

Input variables (based on physicochemical tests): 1. - fixed acidity (tartaric acid - g / dm^3): most acids involved with wine or fixed or nonvolatile (do not evaporate readily) 2. - volatile acidity (acetic acid - g / dm^3): the amount of acetic acid in wine, which at too high of levels can lead to an unpleasant, vinegar taste 3. - citric acid (g / dm^3): found in small quantities, citric acid can add ‘freshness’ and flavor to wines 4. - residual sugar (g / dm^3): the amount of sugar remaining after fermentation stops, it’s rare to find wines with less than 1 gram/liter and wines with greater than 45 grams/liter are considered sweet 5. - chlorides (sodium chloride - g / dm^3): the amount of salt in the wine 6. - free sulfur dioxide (mg / dm^3): the free form of SO2 exists in equilibrium between molecular SO2 (as a dissolved gas) and bisulfite ion; it prevents microbial growth and the oxidation of wine 7. - total sulfur dioxide (mg / dm^3): amount of free and bound forms of S02; in low concentrations, SO2 is mostly undetectable in wine, but at free SO2 concentrations over 50 ppm, SO2 becomes evident in the nose and taste of wine 8. - density (g / cm^3) 9. - pH: describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic); most wines are between 3-4 on the pH scale 10. - sulphates (potassium sulphate - g / dm3): a wine additive which can contribute to sulfur dioxide gas (S02) levels, wich acts as an antimicrobial and antioxidant 11. - alcohol (% by volume): the percent alcohol content of the wine

Output variable (based on sensory data): 12. - quality (score between 0 and 10)

What is/are the main feature(s) of interest in your dataset?

ANS: The main feature of interest is wine’s quality. I would like to investigate which variable(s) effect the wine quality.

What other features in the dataset do you think will help support your

investigation into your feature(s) of interest? ANS: I think smell taste touch and addictive content that will effect the wine’s quality so the features that I choose for investigation is :

  1. sulphates
  2. volatile acidity
  3. citric acidity
  4. chlorides
  5. sum of acid (tartaric acid + citric acid ) as “sourness”
  6. alcohol
##   fixed.acidity volatile.acidity citric.acid residual.sugar chlorides
## 1           7.4             0.70        0.00            1.9     0.076
## 2           7.8             0.88        0.00            2.6     0.098
## 3           7.8             0.76        0.04            2.3     0.092
## 4          11.2             0.28        0.56            1.9     0.075
## 5           7.4             0.70        0.00            1.9     0.076
## 6           7.4             0.66        0.00            1.8     0.075
##   free.sulfur.dioxide total.sulfur.dioxide density   pH sulphates alcohol
## 1                  11                   34  0.9978 3.51      0.56     9.4
## 2                  25                   67  0.9968 3.20      0.68     9.8
## 3                  15                   54  0.9970 3.26      0.65     9.8
## 4                  17                   60  0.9980 3.16      0.58     9.8
## 5                  11                   34  0.9978 3.51      0.56     9.4
## 6                  13                   40  0.9978 3.51      0.56     9.4
##   quality sourness
## 1       5   5.1800
## 2       5   5.4600
## 3       5   5.4784
## 4       6   8.0976
## 5       5   5.1800
## 6       5   5.1800

Did you create any new variables from existing variables in the dataset?

Yes, I create “sourness” from fixed.acidity and citric.acid that represent the sourness of wine.

Of the features you investigated, were there any unusual distributions? Did

you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?

ANS: The distribution for citric acid, volatile.acidity and free.sulfur.dioxide appears bimodal and I tidies the data by remove X feature that I am not interested and transform fixed.acidity and citric.acid to sourness for next investigation.

Bivariate Plots Section

##   [1] 1255  840  538  820  834  888 1213  921   90  300 1152  901 1410  425
##  [15]  599 1367  661  416  564  950  994   85 1530  590  496 1092 1509  333
##  [29]  982 1531  721 1551  344  984  847 1576  753  349  618  454 1159  442
##  [43]   92  838   32  302 1313 1505 1125  365 1420  307  226  943  893  736
##  [57] 1412  218 1466 1042 1157 1575  700  481  331 1204  183 1436   10  802
##  [71]  709  735  730  817 1342 1447  194   40  848 1194  731  827  159 1584
##  [85] 1515  592  829 1353 1054  312  151  685 1181  609 1035  535  992  515
##  [99]  136 1563

Top correlation values for quality is : 1. alcohol : 0.476 2. volatile.acidity : -0.391 3. sulphates : 0.251 4. citric acid : 0.226

## 
##  Pearson's product-moment correlation
## 
## data:  feature and quality
## t = -16.954, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.4313210 -0.3482032
## sample estimates:
##        cor 
## -0.3905578

WineQuality.vs.citric.acid

## 
##  Pearson's product-moment correlation
## 
## data:  feature and quality
## t = 9.2875, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1793415 0.2723711
## sample estimates:
##       cor 
## 0.2263725

WineQuality.vs.sulphates

## 
##  Pearson's product-moment correlation
## 
## data:  feature and quality
## t = 10.38, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2049011 0.2967610
## sample estimates:
##       cor 
## 0.2513971

WineQuality.vs.alcohol

## 
##  Pearson's product-moment correlation
## 
## data:  feature and quality
## t = 21.639, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4373540 0.5132081
## sample estimates:
##       cor 
## 0.4761663

Bivariate Analysis

Talk about some of the relationships you observed in this part of the

investigation. How did the feature(s) of interest vary with other features in the dataset? ANS: From the plots and correlation values sulphates, citric acid acidity, alcohol positively relate with quality but volatile acidity negatively relate with quality.

Alcohol sulphates and volatile acidity ’s plot show the different between 3 wine rating of wine very well but citric acid show the different between normal and good wine poorly.

Did you observe any interesting relationships between the other features

(not the main feature(s) of interest)?

## 
##  Pearson's product-moment correlation
## 
## data:  featureX and featureY
## t = -26.489, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.5856550 -0.5174902
## sample estimates:
##        cor 
## -0.5524957

## 
##  Pearson's product-moment correlation
## 
## data:  featureX and featureY
## t = 13.159, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2678558 0.3563278
## sample estimates:
##     cor 
## 0.31277

## 
##  Pearson's product-moment correlation
## 
## data:  featureX and featureY
## t = 4.4188, df = 1597, p-value = 1.059e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.06121189 0.15807276
## sample estimates:
##       cor 
## 0.1099032

ANS: I found that citric acid and volatile acidity very correlate.

High Corelation:

citric acid and volatile acidity : -0.5524957
citric acid and sulphates acidity : 0.31277

Low Corelation:

citric acid and alcohol acidity : 0.1099032

What was the strongest relationship you found?

ANS: For feature of interest alcohol percentage has highest corelation value. (0.476)
For every pair of features free.sulfur.dioxide and total.sulfur.dioxide has highest corelation value. (0.66

Multivariate Plots Section

First I need to prepare alcohol.level for multivariate plot.

From the plot show that excellent wine mostly stay on the top left, good wine stay in the middle and normal wine stay in the bottom right.

Wine rating.vs.alcohol.level.vs.volatile.acidity plot shows that :

excellent wine ratio in alcohol grade “medium” on volatile.acidity range 0.25-0.4 is very high.

##  [1] "fixed.acidity"        "volatile.acidity"     "citric.acid"         
##  [4] "residual.sugar"       "chlorides"            "free.sulfur.dioxide" 
##  [7] "total.sulfur.dioxide" "density"              "pH"                  
## [10] "sulphates"            "alcohol"              "quality"             
## [13] "sourness"             "wine_rating"          "alcohol.level"

Wine rating.vs.alcohol.level.vs.citric.acid plot shows that excellent wine ratio in alcohol grade “medium” on citric.acid at 0 and 0.4 is very high.

##  [1] "fixed.acidity"        "volatile.acidity"     "citric.acid"         
##  [4] "residual.sugar"       "chlorides"            "free.sulfur.dioxide" 
##  [7] "total.sulfur.dioxide" "density"              "pH"                  
## [10] "sulphates"            "alcohol"              "quality"             
## [13] "sourness"             "wine_rating"          "alcohol.level"

Wine rating.vs.alcohol.level.vs.total.sulfur.dioxide plot shows that excellent wine ratio in alcohol grade “medium” on total.sulfur.dioxide at 5-30 is very high.

Wine rating.vs.alcohol.level.vs.sulphates plot shows that excellent wine ratio in alcohol grade “medium” on sulphates range 0.7-0.9 is very high.

Pattern is not noticable here.

Multivariate Analysis

Talk about some of the relationships you observed in this part of the

investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

From the plots , show that alcohol feature is the highest impact feature.

Wine rating.vs.alcohol.level.vs.volatile.acidity plot shows that excellent wine ratio in alcohol grade “medium” on volatile.acidity range 0.25-0.4 is very high.
Wine rating.vs.alcohol.level.vs.citric.acid plot shows that excellent wine ratio in alcohol grade “medium” on citric.acid at 0 and 0.35-0.5 is very high.
Wine rating.vs.alcohol.level.vs.total.sulfur.dioxide plot shows that excellent wine ratio in alcohol grade “medium” on total.sulfur.dioxide at 5-30 is very high.
Wine rating.vs.alcohol.level.vs.sulphates plot shows that excellent wine ratio in alcohol grade “medium” on sulphates range 0.7-0.9 is very high.
Win rating.vs.alcohol.level.vs.sourness.vs.chlorides shows that there hardly to determine wine quality by tongue (chorides and sourness).

Were there any interesting or surprising interactions between features?

It is very surprise that smell(total.sulfur.dioxide) has influnce over the wine rating but taste(chorides and sourness) has not.


Final Plots and Summary

Plot One

Alcohol Distribution

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    8.40    9.50   10.20   10.42   11.10   14.90

Description One

The distribution for alcohol appears right skewed with the peaking at 9.5 % and slope down to 13.0 % with verry few of 8.0 % to 9.0 %, may be the demand of wine are tend to be lower on higher percent alcohol and peak at 9.5 %.

Plot Two

How does each features influence the wine rating ?

##  [1] "fixed.acidity"        "volatile.acidity"     "citric.acid"         
##  [4] "residual.sugar"       "chlorides"            "free.sulfur.dioxide" 
##  [7] "total.sulfur.dioxide" "density"              "pH"                  
## [10] "sulphates"            "alcohol"              "quality"             
## [13] "sourness"             "wine_rating"          "alcohol.level"
##   [1]  341 1459  740  310  746  485   16   57  963  571  582  630  686   79
##  [15]  866 1433 1167 1022 1337  440 1250  851  801 1430 1126  781 1589  780
##  [29] 1441  625  559  430  593  565 1590 1055 1056 1252  240 1113  752  226
##  [43] 1583 1072 1573  615  246 1203  289  465 1321  687  779  365  111  177
##  [57] 1461  671 1283  529 1189  743 1272  325  931  628  617 1201  526 1074
##  [71] 1284 1175  962   34  668   72  109 1384  626 1247    5 1157  783  663
##  [85] 1524  371  827  224 1460  698 1052 1436 1307  584  494  951  926  203
##  [99] 1061  608

## 
##  Pearson's product-moment correlation
## 
## data:  redwine_data$quality and redwine_data$alcohol
## t = 21.639, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4373540 0.5132081
## sample estimates:
##       cor 
## 0.4761663
## 
##  Pearson's product-moment correlation
## 
## data:  redwine_data$quality and redwine_data$volatile.acidity
## t = -16.954, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.4313210 -0.3482032
## sample estimates:
##        cor 
## -0.3905578
## 
##  Pearson's product-moment correlation
## 
## data:  redwine_data$quality and redwine_data$sulphates
## t = 10.38, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2049011 0.2967610
## sample estimates:
##       cor 
## 0.2513971
## 
##  Pearson's product-moment correlation
## 
## data:  redwine_data$quality and redwine_data$citric.acid
## t = 9.2875, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1793415 0.2723711
## sample estimates:
##       cor 
## 0.2263725

Description Two

Alcohol percentage, Sulphates, Volatile acidity correlate with wine rating positively but Volatile acidity correlate negatively.
The alcohol variance in wine rating good and excellent are much larger than normal.
The citric acid variance in wine rating good and normal are much larger than excellent.

Top correlation values for quality is : 1. alcohol : 0.476 2. volatile.acidity : -0.391 3. sulphates : 0.251 4. citric acid : 0.226

Sulprisingly, volatile.acidity and citric acid is highly correlate negtively.

Plot Three

Where is excellent wine ?

level?

## [1] "On any alcohol  level data - wine rating vs volatile.acidity"
## redwin_data_at.alcoholitem$wine_rating: normal
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1800  0.4600  0.5900  0.5895  0.6800  1.5800 
## -------------------------------------------------------- 
## redwin_data_at.alcoholitem$wine_rating: good
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1600  0.3800  0.4900  0.4975  0.6000  1.0400 
## -------------------------------------------------------- 
## redwin_data_at.alcoholitem$wine_rating: excellent
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1200  0.3000  0.3700  0.4055  0.4900  0.9150
## [1] "On low alcohol  level data - wine rating vs volatile.acidity"
## redwin_data_at.alcoholitem$wine_rating: normal
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1900  0.4800  0.5900  0.5866  0.6700  1.2400 
## -------------------------------------------------------- 
## redwin_data_at.alcoholitem$wine_rating: good
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.2200  0.4000  0.5200  0.5173  0.6200  1.0400 
## -------------------------------------------------------- 
## redwin_data_at.alcoholitem$wine_rating: excellent
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.2100  0.2775  0.3450  0.3633  0.4325  0.5800 
## [1] "On medium-low alcohol  level data - wine rating vs volatile.acidity"
## redwin_data_at.alcoholitem$wine_rating: normal
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1800  0.4300  0.5800  0.5956  0.6950  1.5800 
## -------------------------------------------------------- 
## redwin_data_at.alcoholitem$wine_rating: good
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1600  0.3800  0.4950  0.4965  0.6000  1.0200 
## -------------------------------------------------------- 
## redwin_data_at.alcoholitem$wine_rating: excellent
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.2100  0.3100  0.3600  0.3963  0.4800  0.9150 
## [1] "On medium alcohol  level data - wine rating vs volatile.acidity"
## redwin_data_at.alcoholitem$wine_rating: normal
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.2600  0.5000  0.5900  0.5858  0.6675  1.0400 
## -------------------------------------------------------- 
## redwin_data_at.alcoholitem$wine_rating: good
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1600  0.3525  0.4400  0.4720  0.5800  1.0100 
## -------------------------------------------------------- 
## redwin_data_at.alcoholitem$wine_rating: excellent
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1200  0.3075  0.3800  0.4170  0.5100  0.8500

## [1] "On any alcohol  level data - wine rating vs citric.acid"
## redwin_data_at.alcoholitem$wine_rating: normal
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0800  0.2200  0.2378  0.3600  1.0000 
## -------------------------------------------------------- 
## redwin_data_at.alcoholitem$wine_rating: good
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0900  0.2600  0.2738  0.4300  0.7800 
## -------------------------------------------------------- 
## redwin_data_at.alcoholitem$wine_rating: excellent
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.3000  0.4000  0.3765  0.4900  0.7600
## [1] "On low alcohol  level data - wine rating vs citric.acid"
## redwin_data_at.alcoholitem$wine_rating: normal
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0900  0.2200  0.2372  0.3400  1.0000 
## -------------------------------------------------------- 
## redwin_data_at.alcoholitem$wine_rating: good
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.1000  0.2400  0.2540  0.3825  0.7400 
## -------------------------------------------------------- 
## redwin_data_at.alcoholitem$wine_rating: excellent
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0200  0.3250  0.4650  0.4325  0.5400  0.7200 
## [1] "On medium-low alcohol  level data - wine rating vs citric.acid"
## redwin_data_at.alcoholitem$wine_rating: normal
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0725  0.2350  0.2417  0.3900  0.7400 
## -------------------------------------------------------- 
## redwin_data_at.alcoholitem$wine_rating: good
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0800  0.2650  0.2730  0.4325  0.7800 
## -------------------------------------------------------- 
## redwin_data_at.alcoholitem$wine_rating: excellent
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.3000  0.3900  0.3769  0.4900  0.7600 
## [1] "On medium alcohol  level data - wine rating vs citric.acid"
## redwin_data_at.alcoholitem$wine_rating: normal
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0650  0.1000  0.2104  0.3100  0.7900 
## -------------------------------------------------------- 
## redwin_data_at.alcoholitem$wine_rating: good
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.1200  0.3250  0.3033  0.4600  0.6900 
## -------------------------------------------------------- 
## redwin_data_at.alcoholitem$wine_rating: excellent
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.3150  0.4000  0.3704  0.4900  0.7600

## [1] "On any alcohol  level data - wine rating vs total.sulfur.dioxide"
## redwin_data_at.alcoholitem$wine_rating: normal
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    6.00   23.75   45.00   54.65   78.00  155.00 
## -------------------------------------------------------- 
## redwin_data_at.alcoholitem$wine_rating: good
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    6.00   23.00   35.00   40.87   54.00  165.00 
## -------------------------------------------------------- 
## redwin_data_at.alcoholitem$wine_rating: excellent
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    7.00   17.00   27.00   34.89   43.00  289.00
## [1] "On low alcohol  level data - wine rating vs total.sulfur.dioxide"
## redwin_data_at.alcoholitem$wine_rating: normal
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    8.00   30.00   53.00   61.39   88.00  153.00 
## -------------------------------------------------------- 
## redwin_data_at.alcoholitem$wine_rating: good
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    8.00   25.00   38.00   45.30   63.25  136.00 
## -------------------------------------------------------- 
## redwin_data_at.alcoholitem$wine_rating: excellent
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   10.00   23.25   29.00   28.92   36.50   45.00 
## [1] "On medium-low alcohol  level data - wine rating vs total.sulfur.dioxide"
## redwin_data_at.alcoholitem$wine_rating: normal
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    6.00   20.00   32.50   42.91   57.75  155.00 
## -------------------------------------------------------- 
## redwin_data_at.alcoholitem$wine_rating: good
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    7.00   22.00   35.00   38.93   52.00  160.00 
## -------------------------------------------------------- 
## redwin_data_at.alcoholitem$wine_rating: excellent
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    7.00   19.00   28.00   31.57   43.00  103.00 
## [1] "On medium alcohol  level data - wine rating vs total.sulfur.dioxide"
## redwin_data_at.alcoholitem$wine_rating: normal
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    7.00   15.25   25.50   40.35   61.75  113.00 
## -------------------------------------------------------- 
## redwin_data_at.alcoholitem$wine_rating: good
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    6.00   18.00   33.00   39.04   47.75  165.00 
## -------------------------------------------------------- 
## redwin_data_at.alcoholitem$wine_rating: excellent
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    8.00   14.75   25.00   38.05   45.75  289.00

## [1] "On any alcohol  level data - wine rating vs sulphates"
## redwin_data_at.alcoholitem$wine_rating: normal
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.3300  0.5200  0.5800  0.6185  0.6500  2.0000 
## -------------------------------------------------------- 
## redwin_data_at.alcoholitem$wine_rating: good
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.4000  0.5800  0.6400  0.6753  0.7500  1.9500 
## -------------------------------------------------------- 
## redwin_data_at.alcoholitem$wine_rating: excellent
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.3900  0.6500  0.7400  0.7435  0.8200  1.3600
## [1] "On low alcohol  level data - wine rating vs sulphates"
## redwin_data_at.alcoholitem$wine_rating: normal
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.3300  0.5200  0.5700  0.6216  0.6400  2.0000 
## -------------------------------------------------------- 
## redwin_data_at.alcoholitem$wine_rating: good
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.4300  0.5775  0.6250  0.6796  0.7200  1.9500 
## -------------------------------------------------------- 
## redwin_data_at.alcoholitem$wine_rating: excellent
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.5700  0.6500  0.7600  0.8033  0.8400  1.3600 
## [1] "On medium-low alcohol  level data - wine rating vs sulphates"
## redwin_data_at.alcoholitem$wine_rating: normal
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.3700  0.5400  0.5900  0.6158  0.6700  1.2000 
## -------------------------------------------------------- 
## redwin_data_at.alcoholitem$wine_rating: good
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.4200  0.6000  0.6500  0.6851  0.7600  1.3600 
## -------------------------------------------------------- 
## redwin_data_at.alcoholitem$wine_rating: excellent
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.4700  0.6700  0.7400  0.7517  0.8300  1.1000 
## [1] "On medium alcohol  level data - wine rating vs sulphates"
## redwin_data_at.alcoholitem$wine_rating: normal
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.3700  0.5200  0.5600  0.5888  0.6275  0.8400 
## -------------------------------------------------------- 
## redwin_data_at.alcoholitem$wine_rating: good
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.4000  0.5600  0.6200  0.6475  0.7200  1.0300 
## -------------------------------------------------------- 
## redwin_data_at.alcoholitem$wine_rating: excellent
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.3900  0.6400  0.7400  0.7309  0.8125  1.1300

Description Three

On low alcohol percentage exellent wine quality rarely to be found mostly is normal rating wine. On medium-low alcohol percentage 75% of exellent wine can be found in total.sulfur.dioxide below 52 mg / dm^3 or volatile.acidity below 0.6 g/cm^3 but mostly are normal and good rating wine.
On mediumw alcohol percentage 75% of exellent wine can be found on : total.sulfur.dioxide below 45.75 mg / dm^3 or sulphate upper than 0.64 g/dm^3 and mostly are exellent and good rating wine.


Reflection

## 'data.frame':    1599 obs. of  15 variables:
##  $ fixed.acidity       : num  7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
##  $ volatile.acidity    : num  0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
##  $ citric.acid         : num  0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
##  $ residual.sugar      : num  1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
##  $ chlorides           : num  0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
##  $ free.sulfur.dioxide : num  11 25 15 17 11 13 15 15 9 17 ...
##  $ total.sulfur.dioxide: num  34 67 54 60 34 40 59 21 18 102 ...
##  $ density             : num  0.998 0.997 0.997 0.998 0.998 ...
##  $ pH                  : num  3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
##  $ sulphates           : num  0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
##  $ alcohol             : num  9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
##  $ quality             : int  5 5 5 6 5 5 5 7 7 5 ...
##  $ sourness            : num  5.18 5.46 5.48 8.1 5.18 ...
##  $ wine_rating         : Ord.factor w/ 3 levels "normal"<"good"<..: 1 1 1 2 1 1 1 3 3 1 ...
##  $ alcohol.level       : Ord.factor w/ 3 levels "low alcohol"<..: 1 1 1 1 1 1 1 2 1 2 ...

The data set contain 1599 wine from 2009. I start by understand the variables in data set and try to interpret in term of sense that human can percieve. Surprisingly, I found that the taste sourness and salty has no evidence that they has influence over the quality of wine but the smell (total sulfur dioxide), addictive content (alcohol), voilatile acidity, citric acid and sulphates has influence over it. On low alcohol percentage we hardly found excellent wine_rating but mostly is normal and you can find some of good wine rating if they has total SO2 in range 5-60 and sulphates in range 0.53-0.73 ,On medium-low alcohol percentage wine exellent can be found on low volatile acidity and total sulfur dioxide below 55 but mostly are normal and good rating wine,On mediumw alcohol percentage wine exellent can be found at high percentage on total sulfur dioxide below 50 and sulphate upper than 0.65 and mostly are exellent and good rating wine. I struggled to visulize multivariate plot to clearly present the relation more than one featue against wine quality at first finally I found out that if I create new varible that represent the feature as group It will be easier,Next I can not clearly present the relation of selected features by geom_point as you can see the correlation value not so high for each feature but this become much better when I decide to use histogram.

Finally I am very enjoyed to make this analysis,It makes me more understand how to set the questions and solve them.